Search CORE

37 research outputs found

Predictive Cyber Situational Awareness and Personalized Blacklisting: A Sequential Rule Mining Approach

Author: CESNET and Masaryk University
CESNET.
CESNET.
Fournier-Viger Philippe
Fournier-Viger Philippe
Husák Martin
Husák Martin
Ma Xiaobo
Ramaki Ali Ahmadian
Software Foundation The Apache
Software Foundation The Apache
Soska Kyle
Veeramachaneni Kalyan
Zhang Jian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Cybersecurity adopts data mining for its ability to extract concealed and indistinct patterns in the data, such as for the needs of alert correlation. Inferring common attack patterns and rules from the alerts helps in understanding the threat landscape for the defenders and allows for the realization of cyber situational awareness, including the projection of ongoing attacks. In this paper, we explore the use of data mining, namely sequential rule mining, in the analysis of intrusion detection alerts. We employed a dataset of 12 million alerts from 34 intrusion detection systems in 3 organizations gathered in an alert sharing platform, and processed it using our analytical framework. We execute the mining of sequential rules that we use to predict security events, which we utilize to create a predictive blacklist. Thus, the recipients of the data from the sharing platform will receive only a small number of alerts of events that are likely to occur instead of a large number of alerts of past events. The predictive blacklist has the size of only 3 % of the raw data, and more than 60 % of its entries are shown to be successful in performing accurate predictions in operational, real-world settings

Crossref

Univerzitní repozitář Masarykovy univerzity

Figure 10: Run time versus number of concurrent jobs that use the HBase index.

Crossref

BiobankUniverse:Automatic matchmaking between datasets for biobank data discovery and integration

Author: Bart Charbon
Chao Pang
David van Enckevort
Dennis Hendriksen
Fleur Kelpin
Fortier
Hans Hillege
Holub
Jonathan Jetten
Jonathan Wren
Kaisa Silander
Maelstrom Research
Mark de Haan
Merino-Martinez
Miles
Morris A Swertz
Niina Eklund
Norlin
Pang
Pang
Pennington
Petr Holub
Scholtens
Shima
Swertz
The Apache Software Foundation
Tommy de Boer
Wolffenbuttel
Wu
Publication venue
Publication date: 15/11/2017
Field of study

Motivation: Biobanks are indispensable for large-scale genetic/epidemiological studies, yet it remains difficult for researchers to determine which biobanks contain data matching their research questions. Results: To overcome this, we developed a new matching algorithm that identifies pairs of related data elements between biobanks and research variables with high precision and recall. It integrates lexical comparison, Unified Medical Language System ontology tagging and semantic query expansion. The result is BiobankUniverse, a fast matchmaking service for biobanks and researchers. Biobankers upload their data elements and researchers their desired study variables, BiobankUniverse automatically shortlists matching attributes between them. Users can quickly explore matching potential and search for biobanks/data elements matching their research. They can also curate matches and define personalized data-universes

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Apache HTTP Server Version 1.3.37 for Linux

Author: The Apache Software Foundation
Publication venue
Publication date: 01/01/2006
Field of study

Biblioteca Digital da Memória Científica do INPE

Energy efficiency of large scale graph processing platforms

Author: Gog Ionel
Gonzalez Joseph E.
Leskovec Jure
Rini
Software Foundation The Apache
Software Foundation The Apache
Zaharia Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2016
Field of study

A number of graph processing platforms have emerged recently as a result of the growing demand on graph data analytics with complex and large-scale graph structured datasets. These platforms have been tailored for iterative graph computations and can offer an order of magnitude performance gain over generic data-flow frameworks like Apache Hadoop and Spark. Nevertheless, the increasing availability of such platforms and their functionality overlap necessitates a comparative study on various aspects of the platforms, including applications, performance and energy efficiency. In this work, we focus on the energy efficiency aspect of some large scale graph processing platforms. Specifically, we select two representatives, e.g., Apache Giraph and Spark GraphX, for the comparative study. We compare and analyze the energy consumption of these two platforms with PageRank, Strongly Connected Component and Single Source Shortest Path algorithms over five different realistic graphs. Our experimental results demonstrate that GraphX outperforms Giraph in terms of energy consumption. Specifically, Giraph consumes 1.71 times more energy than GraphX on average for the mentioned algorithms

Crossref

VTT Research System

CERN Document Server

Scaling J2EE™ application servers with the Multi-tasking Virtual Machine

Author: Borman
Czajkowski
Czajkowski
Czajkowski
Czajkowski
Daynes
Dillenberger
Fleury
Jordan
Kuck
Liang
Mauro
Microsoft Corp
Palacz
Sun Microsystems Inc
Sun Microsystems Inc
Sun Microsystems Inc
Sun Microsystems Inc
Sun Microsystems Inc
The Apache Software Foundation
The Apache Software Foundation
The Apache Software Foundation
Welsh
Publication venue: 'Wiley'
Publication date: 01/01/2006
Field of study

Crossref

Big data processing tools: An experimental performance evaluation

Author: Bobade
Boncz
Chen
Floratou
Gounaris
Landset
Li
Li
Mehta
Pirzadeh
Prasad
Rodrigues
The Apache Software Foundation
Publication venue: 'Wiley'
Publication date
Field of study

Big Data is currently a hot topic of research and development across several business areas mainly due to recent innovations in information and communication technologies. One of the main challenges of Big Data relates to how one should efficiently handle massive volumes of complex data. Due to the notorious complexity of the data that can be collected from multiple sources, usually motivated by increasing data volumes gathered at high velocity, efficient processing mechanisms are needed for data analysis purposes. Motivated by the rapid growth in technology, development of tools, and frameworks for Big Data, there is much discussion about Big Data querying tools and, specifically, those that are more appropriated for specific analytical needs. This paper describes and evaluates the following popular Big Data processing tools: Drill, HAWQ, Hive, Impala, Presto, and Spark. An experimental evaluation using the Transaction Processing Council (TPC-H) benchmark is presented and discussed, highlighting the performance of each tool, according to different workloads and query types. This article is categorized under: Technologies > Computer Architectures for Data Mining Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Data Preprocessing Application Areas > Data Mining Software Tools.FCT – Fundação para a Ciência e Tecnologia, Grant/Award Number: UID/CEC/00319/2013; COMPETE, Grant/Award Number: POCI01-0145-FEDER-007043info:eu-repo/semantics/publishedVersio

Universidade do Minho: RepositoriUM

Crossref

A QA cycle for teaching programming. A mechanism for automatically posing questions corresponding to learner's skill

Author: ALIC (Advanced Learning Basic Assoc.)
IMS (The Instructional Management Systems)
Kashiwabara
MySQL
Ono
Sun Microsystems
Sun Microsystems
The Apache Software Foundation
Wakabayashi
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

How do I choose the right NoSQL solution? A comprehensive theoretical and experimental survey

Author: A. Feinberg
A. Moniruzzaman
A. Pavlo
A. Tizghadam
Apache Software Foundation
Apache Software Foundation
ArangoDB GmbH
Aurelius LLC
B. F. Cooper
B. Fitzpatrick
Basho Technologies
C. Strozzi
D. Pritchett
F. Chang
G. Vaish
Hibernating Rhinos
Hypertable Inc
J. Gray
J. Jose
J. Klein
LinkedIn
M. A. Olson
M. Burrows
MongoDB Inc.
Orient Technologies
P. Andlinger
P. Wiki
R. C. McColl
R. Casado
R. Cattell
RedisLabs
S. Edlich
S. IT
S. IT
S. Jouili
S. K. Gajendran
S. Sivasubramanian
SAVI
T. Rabl
Technology
The Apache Foundation
The Apache Foundation
The Apache Foundation
vsChart.com
Y. Abubakar
Publication venue: 'American Institute of Mathematical Sciences (AIMS)'
Publication date
Field of study

Crossref

Statistics: Essential Now More Than Ever

Author: Barry D. Nussbaum
Brustein J.
Committee on Professional Ethics
Isaac M.
Lewis M.
National Academies of Sciences Engineering, and Medicine
Nussbaum B.
Pierson S.
Spark Apache Organization
The Apache Software Foundation
The TensorFlow Organization
Wasserstein R.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref